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Abstract 

Background: Changes in microRNA (miRNA) expression patterns liave been extensively characterized in several 
cancers, including human colon cancer. However, how these miRNAs and their putative mRNA targets contribute 
to the etiology of cancer is poorly understood. In this work, a bioinformatics computational approach with miRNA 
and mRNA expression data was used to identify the putative targets of miRNAs and to construct association 
networks between miRNAs and mRNAs to gain some insights into the underlined molecular mechanisms of 
human colon cancer. 

Method: The miRNA and mRNA microarray expression profiles from the same tissues including 7 human colon 
tumor tissues and 4 normal tissues, collected by the Broad Institute, were used to identify significant associations 
between miRNA and mRNA. We applied the partial least square (PLS) regression method and bootstrap based 
statistical tests to the joint expression profiles of differentially expressed miRNAs and mRNAs. From this analysis, we 
predicted putative miRNA targets and association networks between miRNAs and mRNAs. Pathway analysis was 
employed to identify biological processes related to these miRNAs and their associated predicted mRNA targets. 

Results: Most significantly associated up-regulated mRNAs with a down-regulated miRNA identified by the 
proposed methodology were considered to be the miRNA targets. On average, approximately 16.5% and 11.0% of 
targets predicted by this approach were also predicted as targets by the common prediction algorithms 
TargetScan and miRanda, respectively. We demonstrated that our method detects more targets than a simple 
correlation based association. Integrative mRNA:miRNA predictive networks from our analysis were constructed 
with the aid of Cytoscape software. Pathway analysis validated the miRNAs through their predicted targets that 
may be involved in cancer-associated biological networks. 

Conclusion: We have identified an alternative bioinformatics approach for predicting miRNA targets in human 
colon cancer and for reverse engineering the miRNA:mRNA network using inversely related mRNA and miRNA joint 
expression profiles. We demonstrated the superiority of our predictive method compared to the correlation based 
target prediction algorithm through a simulation study. We anticipate that the unique miRNA targets predicted by 
the proposed method will advance the understanding of the molecular mechanism of colon cancer and will 
suggest novel therapeutic targets after further experimental validations. 



Background 

Colon cancer is the third most common cancer in the 
United States [1] and contributes to over 655,000 deaths 
per year worldwide. However when diagnosed early, it is 
one of the most treatable cancers [2]. Many efforts are 
focused on detection rates and screening utilization 
[3,4]. The primary treatment for colon cancer may 
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involve surgery, chemotherapy, biological therapy or 
radiation [5]. Unfortunately, these treatments often 
damage normal cells and tissues, so side effects are 
common as are the possibilities of drug resistance and 
disease recurrence [6]. Therefore, the identification of 
new biomarkers for early diagnosis and prognosis of 
human colon cancer is of great interest. Furthermore, 
such biomarkers might be useful for the development of 
novel therapeutics. 

Recently miRNAs have been suggested to be among 
these potential biomarkers [7]. The miRNAs are short 
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(~21nt) non-coding RNAs that regulate gene expression 
by causing transcript degradation or translational repres- 
sion [8,9]. In animals, they have been found to regulate 
a wide range of biological processes such as stem cell 
maintenance, development, metabolism, host-viral inter- 
action, differentiation, proliferation and apoptosis [10]. 
Their activities are also implicated in cancer progression 
or suppression by affecting oncogenesis, growth, inva- 
sion and metastasis [11]. Studies suggest that three of 
these miRNAs, miR-15a, miR-16 [12,13] and let-7 
[14,15] can function as tumor suppressors, while miR- 
155 and miR-21 play roles in oncogenesis [16,17]. Many 
studies have also shown that miRNAs play a critical role 
in cancer initiation and progression. For example, 
miRNA-135a and miR-135b are reported to be involved 
in the initiation of human colon cancer by targeting the 
adenomatous polyposis coli gene (APC) [18]. Alterations 
in miRNA expression patterns are commonly associated 
with numerous human cancers, such as colorectal can- 
cer and chronic lymphocytic leukemia [19]. Lu et al. 
used 218 miRNAs expression profiles from 334 samples, 
including multiple human cancers, to obtain information 
on the developmental lineage and differentiation state of 
the tumors [20]. They found most miRNAs in tumors 
were down-regulated compared to expression levels in 
normal tissues. Thus, miRNA expression profiles may 
be a useful tool in human cancer diagnosis and in 
designing treatments strategies. Nevertheless, specific 
targets of the abnormally expressed miRNAs and their 
biological functions remain largely unknown. So, accu- 
rately identifying those genes regulated by such miRNAs 
and their biological functions in affecting cancer devel- 
opment and progression is of utmost importance. 

Although, an increasing number of miRNA targets 
have been validated experimentally http://diana.cslab. 
ece.ntua.gr/tarbase, a majority of miRNA targets are 
currently unknown and bioinformatics algorithms 
remain the primary means of predicting these putative 
targets. The principles of these algorithms are based on 
sequence complementarity, species conservation, ther- 
modynamic stability, and site accessibility. Currently, 
there are eight widely used algorithms for miRNA target 
prediction (DIANA-microT 3.0, EIMMo, miRanda, miR- 
Base, PicTar, PITA, RNA22 and TargetScan 5.1) [11]. 
However, the utility of these computational techniques 
is limited by various factors including miRNA size, iden- 
tification of 3' UTR and conservation analysis [19]. 
Furthermore, to our knowledge, none of these algo- 
rithms incorporate high-throughput microarray gene 
expression profiling data to predict miRNA targets. 

As many miRNAs initiate the degradation of target 
mRNA transcripts [8], simultaneous expression profiles 
of miRNAs and mRNAs should reveal the existence of 
such inverse relationships. Huang et al. in 2007 was the 



first to utilize paired expression profiles of miRNAs and 
mRNAs to identify functional miRNA targets using a 
Bayesian data analysis algorithm (GeneMiR++) [21]. 
More recently, a similar strategy was used to identify 
hepatitis C virus (HCV) -associated miRNA- mRNA regu- 
latory modules [22]. However, the integrative computa- 
tional method used in that paper was based on the 
similarities in the expression profiles of miRNAs and 
mRNAs across samples to calculate a simple miRNA- 
mRNA correlation matrix. This pairwise correlation 
method may grossly undermine the existing coUineari- 
ties amongst the covariates. The complex interrelation- 
ship of multiple miRNAs influencing the expression of 
one mRNA can be modeled by a regression method. 
However, due to the large number of miRNAs com- 
pared to the sample size and the existing coUinearities 
amongst the covariates (miRNAs) it is impossible to use 
an ordinary linear regression model. So we use PLS 
regression model to cope with these issues. Associations 
between gene expression profiles have been previously 
studied using a PLS regression approach in [23]. Since 
then several others have proposed using PLS in the 
microarray and proteomics contexts [24-27]. Standard 
linear regression is not applicable for microarray gene 
expression data where the number of covariates far 
exceeds the sample size. PLS is a method for construct- 
ing predictive models when there are numerous covari- 
ates and many of them are highly coUinear. To the best 
of our knowledge, a PLS regression method has not 
been used previously to integrate miRNA and mRNA 
microarray data to predict miRNA targets. 

Thus, the PLS regression method was used to explore 
the likely associations/interactions between miRNA and 
mRNA using expression data previously collected from 
7 colon tumors [20]. We have used Pathway analysis to 
understand the biological processes linked to these miR- 
NAs and their predicted target mRNAs in the context 
of colorectal cancer. 

Results 

Outline of our analysis 

We used a multi-step approach in our analysis. In STEP 
1, an un-paired t-test was used with FDR correction 
methods to identify differentially expressed miRNAs and 
mRNAs in colon tumors and normal colon tissues. In 
STEP 2, the PLS regression method and statistical tests 
on the association measures were performed. Repeated 
bootstrap samples were used to identify significant asso- 
ciations/interactions between miRNA and mRNA using 
31 down-regulated miRNAs and 71 up-regulated 
mRNAs. This PLS-bootstrap algorithm work-flow is out- 
lined in the Figure 1. In STEP 3, we compared the pre- 
dictions from our PLS regression method with those 
from TargetScan 5.1 and miRanda. In STEP 4, a gene 
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1. Ran PLS with 71 
mRNA&31 miRNA, 
calculated the 
association scores and 
stored them in a 71x31 
matrix 



i. i. i. i. 

73 73 73 

-z.-z.-z.-z. 
> > > > 



mRNAl 
mRNA2 
mRNA3 
mRNA4 
mRNAS 



2. Removed 1 column of the X 
variable (miRNA expressions) at 
a time, ran PLS & recorded the 

predicted values of Y (mRNA 
expressions) 



6. Calculated the proportion of times a specific 
association score is less than the observed 
association scores amongst all the 1000 
bootstrapped association scores, which were the 
estimated p-value. A small p-value implied strong 
miRNA-mRNA association/interaction 



i 



7. A Benjamini & Hochberg FDR correction to obtain 
an adjusted p-value. The mRNA which had strong 
association with a miRNA is deemed to be the target 
of that miRNA 



5. Ran PLS with new Y 
variables and 31 miRNAs 
and calculated the new 
associated scores and 
repeated bootstrarpping 
for 1000 times 



3. Calculated the 
residuals by subtracting 

the predicted values 
from the observed ones 



i 



4. Bootstrapped by re- 
sampling from residuals 
and calculated new Y 
variables 



Figure 1 Algorithm to find miRNA targets. Illustrated is a workflow involving PLS and statistical testing to find the significant mRNA targets 
for each down-regulated miRNA. 



ontology analysis was constructed with the aid of Path- 
way Studio software. In STEP 5, miRNA-mRNA inter- 
acting networks were developed with the aid of 
Cytoscape 2.7 software. In STEP 6 and 7, IPA and IPA 
pathway designer were used to create a functional analy- 
sis of biological processes related to miRNAs and their 
target mRNAs. Finally, in STEP 8, a simulation was per- 
formed to evaluate the false detection rate for PLS. 
Steps 1-3 can be considered as the discoveries after 
mining the real data and Steps 6-8 correspond to the 
validities of these mined results. 

STEP 1. Differentially expressed miRNA and mRNA in 
human colon tissues 

The miRNA and mRNA microarray expression profiles 
from the same tissues including 7 human colon tumor 
tissues and 4 normal tissues were analyzed with the aid 
of GeneSpring GXIO software (Agilent, CA). The Benja- 
mini & Hochberg FDR (false discovery rate) was used to 
obtain adjusted p-values for unpaired t- tests [28]. Differ- 
entially expressed miRNAs and mRNAs with adjusted p- 
values less than or equal to 10% and the absolute fold 
changes greater than or equal to 1.2 were obtained. We 
found a total of 31 down-regulated miRNAs and 2 up- 
regulated miRNAs that fit these criteria (Additional file 
1, table SI). In addition, we found 73 up-regulated 



mRNAs and 63 down-regulated mRNAs (Additional file 
2, table S2). Some of the mRNAs did not have gene IDs. 
There were only two up-regulated miRNAs using the 
above mentioned cutoff. Therefore, in this study we 
selected the 31 down-regulated miRNAs and the 71 up- 
regulated mRNAs to find the most likely miRNA:mRNA 
negative/inverse associations with the aid of the PLS 
regression method. Among the differentially expressed 
miRNAs, only miR-135b and miR-221 are significantly 
up-regulated (4.3 and 1.7 fold, respectively) in the colon 
cancer tissue samples. The remaining 31 miRNAs 
(Additional file 1, table SI) were significantly down- 
regulated in these samples. 

STEP 2. miRNAimRNA Associations predicted by PLS 

To identify the most likely prospective targets the 31 
down-regulated miRNAs were used as independent vari- 
ables and the 71 mRNAs were denoted as dependent 
variables in the PLS regression analysis. Subsequently, 
statistical tests were performed to identify significant 
negative associations between each mRNA with the 31 
down-regulated miRNAs. The algorithm flow diagram 
for this multi-step procedure is shown in Figure 1. 

Table 1 lists the 31 down-regulated miRNAs and their 
associated target mRNAs predicted by the PLS regres- 
sion method. The maximum number of predicted 
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Table 1 Identification of 31 down -regulated miRNAs from a list of 71 up-regulated mRNAs 



Down-regulated 
miRNA 


miRNA 
family 


No. of mRNA 
targets 


Target genes predicted by the PLS method 


*hsa-let-7 g 


Let-7 g 


17 


BUD31, COTLl, CPNEl, EIF3B, FTL, GTF3C1, IFITMl, NAP1L4, NUDT1, PHB, PSMC2, PSMC4, PUF60, Ras- 
Like protein, RPLIO, SRD5A3, WDR36 


hsa-miR-1 


miR-1 


1 


RPLIO 


hsa-miR-100 


miR-100 


1 


GSTPl 


*hsa-miR-101 


miR-101 


7 


ATP6V0B, GTF2IRD1, GUKl, NEBL, PSMB4, UBE2I, VKORCl 


nsa-miK- 1 UD 


miK- 1 UD 


2 


IVllr, rblVlLz 


nsa-miK- 1 zo 


m IK- 1 ZD 


1 6 


AMVAo r'CTD niP\iT/i r'DATr'u^ ( — miDP\i r\\V'\ ^/lP\l/ mpiI/ mcdi ^\K~7^ dcn/id/i cinnAii ci r — 7Aii 
ANaAz, Lb 1 D, UUI 1 4, UrA 1 LHo, U 1 rzinU 1 , bUK 1 , iVlUK, NUK, NLdL, UAZ I , rblVlD4, b 1 UUA 1 1 , oLL/A 1 1 , 

TBCD, UBA6, VKORCl 


hsa-miR-1 30a 


miR-1 30 


2 


GSTPl, RPL34 


*hsa-miR-133a 


miR-1 33 


17 


ANXA2, ATP6V0B, CSTB, GPATCH3, GPXl, GTF2IRD1, NAP1L4, NEBL, OAZl, P0LR2H, PSMB4, RPLIO, 
SLC7A1 1, TBCD, UBA6, UBE2I, VKORCl 


hsa-miR-142-5p 


miR-1 42 


6 


ARPCIB, DDIT4, GSTPl, MDK, TUFM, USP22 


*hsa-miR-150 


miR-1 50 


20 


ABHD12, C7orf28A, CLIC1, CPNEl, EIF3B, GSTPl, GTF3C1, IFITMl, LAPTM5, NAP1L4, 0DC1, PRDX2, 
PSMB2, PSMC4, PSMD8, PUF60, RPLIO, SRD5A3, TBCD, Wilm's tumor-related protein 


*hsa-miR-15a 


miR-1 5 


8 


C7orf28A, CAPG, CPNEl, GTF3C1, NUDTl, PSMD8, WDR36, Wilm's tumor-related protein 


*hsa-miR-l 6 


min- 1 0 


Q 
O 


r 1 L, oiroLi, LAr MvD, rnb, roiviuo, onUjAo, vvunoD, vviim s tumor-reiateo protein 


nsa-miK- 1 yj 


min- 1 yj 


z 


DQNyiPiQ DDI 1 n 
rolVlUo, KrL 1 U 


hsa-miR-26a 


miR-26 


3 


CAPG, FTL, GTF3C1 


hsa-miR-26b 


miR-26 


15 


ATP6V0B, BUD31, CANX, CAPG, EIF3B, FTL, MIF, NAP1L4, NUDTl, ODCl, P0LR2H, PSMC2, PSMC4, 

UBE2I, WDR36 


hsa-miR-28 


miR-28 


21 


ABHD12, ARPCIB, CLICl, CPNEl, EIF3B, FTL, GTF3C1, IFITMl, LAPTM5, NAP1L4, ODCl, PRDX2, PSMB2, 
PSMC4, PSMD8, PUF60, RPLIO, RPS9, SRD5A3, TUFM, Wilm's tumor-related protein 


*hsa-miR-29b 


miR-29 


15 


ANXA2, CDC42SE1, CSTB, DDIT4, GNPTG, GPATCH3, GPXl, ITPA, MDK, PSMB4, SlOOAl 1, SLC7A1 1, 

USP22, VKORCl, ZNF37A 


*hsa-miR-29c 


miR-29 


7 


ABHD12, C7orf28A, PSMD8, RPLIO, TBCD, USP22, ZNF37A 


*hsa-miR-30a 


miR-30 


22 


ANXA2, C7orf28A, CAPG, CSTB, DAP, GPATCH3, GPXl, GTF2IRD1, GUKl, ITPA, MMPll, NEBL, NUDTl, 
OAZl, P0LR2H, PSMB4, SLC7A1 1, TBCD, UBA6, UBE2I, VKORCl, ZNF37A 


*hsa-miR-30b 


miR-30 


20 


ANXA2, CDC42SE1, CSTB, DDIT4, GNPTG, GPATCH3, GPXl, GPX4, GSTPl, GTF2IRD1, ITPA, MDK, NDK, 
OAZl, PSMB4, SlOOAl 1, SLC7A1 1, USP22, VKORCl, ZNF37A 


*hsa-miR-30c 


miR-30 


15 


ANXA2, CSTB, GPATCH3, GSTPl, GTF2IRD1, ITPA, MDK, NDK, OAZl, PSMB4, SlOOAl 1, SLC7A1 1, UBA6, 

VKORCl, ZNF37A 


*hsa-miR-30d 


miR-30 


25 


ANXA2, ATP6V0B, BHLHE40, C7orf28A, CAPG, CDC42SE1, CSTB, DAP, DDIT4, GPATCH3, GPXl, 

( — mi Dm N/llH N/1N/1D11 MCDI D/^l DOU DCN/ID/I DDI in CI C\C\ All Q\ ( — 7A11 TDr'Pi IID A ^ II DUO 1 

UlrzlKUl, IVllr, IVllVlr 1 1, NLbL, rULKzH, rolVlD4, KrL ID, o 1 UUA 1 1, oLL/A 1 1, 1 dLU, UdAo, UdLzI, 

VKORCl, ZNF37A 


*hsa-iniR-30e 


miK-oU 


1 7 


ATD^\/nD DUI UC/in r~lr^v^^QK r K^r CQTU P\AD (^DATr"UQ /^DVI N/1N/1D1 1 MCDI Ml IP\T1 ^\K~7^ 

AlroVUD, DnLriL4U, L/OrtzoA, LArb, Lb 1 b, UAr, LrAILrib, LrAI, IVllVlr 1 1, NLbL, NUUI 1, UAZI, 
P0LR2H, RPLIO, TBCD, VKORCl, ZNF37A 


nsa-min-oz 


miR-32 


r 
J 


( — 7r^rf^QA dqn/ipiq ddi in TDr'Pi 
L/OrtZoA, UULI, rblVlUo, KrLIU, 1 bLU 


1 Iba-IT iIr-j^D 


IT lln-j^ 


1 1 
\ Z 


Dl ir^^1 TAMY CIC^D CTI TTC^TI ICITK/11 PI-ID PQK/irA PI ICAH DPI ^4 QDF^^^A^ \A/nD^A 
bUUjI, LAlNlA, tirjb, r 1 L, U 1 r oL 1 , irillVII, rnb, rjlVIL^, rUrOU, nrLj^, jrUjAj, VVUnoD 


hsa-miR-99a 


miR-99 


19 


ANXA2, CSTB, DDIT4, GPATCH3, GSTPl, GTF2IRD1, MDK, MIF, NDK, NEBL, OAZl, P0LR2H, SlOOAl 1, 
SLC7A1 1, TBCD, UBA6, UBE2I, VKORCl, ZNF37A 


mmu-miR-lOb 


miR-1 0 


2 


Ras-Like protein, WDR36 


mmu-miR-151 


miR-151 


13 


ATP6V0B, BHLHE40, C7orf28A, CAPG, GTF2IRD1, MIF, MMPll, NUDTl, OAZl, P0LR2H, RPLIO, TBCD, 

VKORCl 


*mmu-miR-342 


miR-342 


21 


ABHD12, ARPCIB, CLICl, CPNEl, EIF3B, FTL, GTF3C1, IFITMl, NAP1L4, ODCl, PSMB2, PSMC4, PSMD8, 
PUF60, RPLIO, RPL34, RPS9, SRD5A3, TGFBI, TUFM, Wilm's tumor-related protein 


rno-miR-140 


miR-1 40 


1 


NDK 



mo-miR-151 miR-151 16 ANXA2, CDC42SE1, CSTB, DDIT4, GNPTG, GPATCH3, GPXl, GTF2IRD1, ITPA, MDK, MMPll, PSMB4, 

SI OOAl 1 , SLC7A1 1 , USP22, ZNF37A 



*miRNAs are selected as cancer-mediators based on the literature. 
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targets was 25 for miR-30d, and the minimum number 
of predicted targets was one for miR-1, miR-100 and 
rno-miR-140 (Table 1). The number of predicted targets 
for each member of the miR-30 family (a-e) was 22, 20, 
15, 25 and 17, respectively. Table 2 lists the mRNAs for 
each of those miRNAs which have been previously 
reported to have associations with some form of cancer. 
Table 2 also contains some additional results related to 
STEP 3. 

STEP 3. Comparisons between PLS Regression, TargetScan 
5.1 and miRanda 

Comparing the PLS regression method with other com- 
putational miRNA-target prediction algorithms such as 
TargetScan 5.1 and miRanda we found some areas of 
agreement. Thus, some of the miRNA-targets predicted 
by PLS were also predicted by either TargetScan or 
miRanda or both of them (Table 2). This table provides 
the number of miRNA-targets for those miRNAs report- 
edly linked to cancer [29-42]. It shows the percentage of 
common targets predicted by our use of the PLS regres- 
sion method, with TargetScan 5.1 and miRanda. 

Using 15 down-regulated miRNAs (let-7 g, miR-101, 
miR-133a, miR-150, miR-15a, miR-16, miR-29b, miR- 
29c, miR-30a, miR-30b, miR-30c, miR-30d, miR-30e, 
miR-34b and miR-342), known to be associated with 
cancer, we found 16.5% and 11.0% of our PLS-predicted 
miRNA-targets, on average, were also predicted as tar- 
gets for the corresponding miRNAs by TargetScan5.1 
and miRanda, respectively (Table 2). Moreover, we 
found that five out of 25 of our PLS-predicted targets 
(20%) for miR-30d were also predicted as targets for this 



miRNA by both TargetScan5.1 and miRanda. These five 
target genes predicted by our PLS regression approach, 
TargetScan 5.1 and miRanda were BHLHE40, DDIT4, 
NEBL, SLC7A11 and UBE2I. In this regard, it is inter- 
esting to note that our method identified more common 
targets with TargetScan than miRanda. From Tables 1 
and 2 it is evident that our analysis provides many novel 
targets that were not identified by these existing algo- 
rithms which may prove to be significant biomarkers for 
colon cancer. In order to further pursue the integrative 
analysis an association network for these significantly 
associated miRNAs and mRNAs was constructed (see 
STEP 5). 

STEP 4. Gene ontology (GO) terms for miRNA-targets in 
the category of biological processes 

To gain further insight into the biological functions of 
each miRNA and its predicted targets, the 71 up-regu- 
lated mRNAs from the colon cancer microarray were 
first analyzed with the aid of Pathway Studio 7 (Ariadne 
Genomics Inc., Rockville, MD, USA). This analysis was 
used to obtain the gene ontological (GO) terms for the 
category of biological processes. We selected three of 
the most significant GO terms with all of the unadjusted 
p-values less than 0.02 except one function termed DNA 
replication which had a p-value 0.05. The corresponding 
adjusted p-values were less than 0.06 and 0.13 respec- 
tively. Almost all the terms are significant at FDR 10%. 
The remaining term is significant at FDR 13%. In Table 
3 the significant GO terms associated with the predicted 
targets for 15 miRNAs linked to colorectal cancer tis- 
sues are listed as biological processes. The significant 



Table 2 Identification of miRNA targets from 15 down-regulated and 71 up-regulated mRNAs 



Cancer-related and down- 
regulated miRNA 


miRNA 
family 


No. of target 
mRNAs 


No. (%) of overlapping PLS predicted 
mRNAs with TargetScanS.I 


No. (%) of overlapping PLS predicted 
mRNAs with miRanda 


hsa-let-7 g 


Let-7 g 


16 


0 


2(12.5) 


hsa-miR-101 


miR-101 


7 


0 


0 


hsa-miR-133a 


miR-133 


17 


3(17.6) 


2(11.8) 


hsa-miR-150 


miR-150 


20 


3(15.0) 


1(5.0) 


hsa-miR-15a 


miR-15 


8 


1(12.5) 


1(12.5) 


hsa-miR-16 


miR-16 


8 


1(12.5) 


0 


hsa-miR-29b 


miR-29 


15 


3(20.0) 


2(13.3) 


hsa-miR-29c 


miR-29 


7 


2(28.6) 


1(14.3) 


hsa-miR-30a 


miR-30 


22 


3(13.6) 


3(13.6) 


hsa-miR-30b 


miR-30 


20 


4(20.0) 


2(10.0) 


hsa-miR-30c 


miR-30 


15 


1(6.7) 


1(6.7) 


hsa-miR-30d 


miR-30 


25 


7(28.0) 


5(20.0) 


hsa-miR-30e 


miR-30 


17 


3(17.6) 


3(17.6) 


hsa-miR-34b 


miR-34 


12 


3(25.0) 


0 


mmu-miR-342 


miR-342 


21 


4(19.0) 


1(4.8) 


Total 






16.5 


11.0 
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biological processes include anaphase-promoting com- 
plex (APC) -dependent proteasomal ubiquitin-dependent 
protein catabolic process, negative or positive regulation 
of ubiquitin-protein ligase activity during mitotic cell 
cycle, polyamine biosynthetic process, oxidative stress, 
oxidation reduction, oxidoreductase activity, glutathione 
peroxidase process, regulation of cell proliferation and 
apoptosis, translational elongation and lipid metabolic 
process. Interestingly, many of the target mRNAs 
expressed in the colorectal cancer tissues and associated 
with the miRNAs appear to be involved in protein 
degradation and apoptosis and are linked to the oxida- 
tive stress signal transduction pathway. Therefore, these 
data highlight the oxidative stress mechanism that is 
operative in colorectal cancer tissue. 

STEP 5. Visualization of miRNAimRNA association 
networks 

Using the statistically significant Pij -association scores 
between the i-th mRNA and j-th miRNA, derived from 
the PLS regression method, we constructed association 
networks including the significant miRNAs and 
mRNAs. The resulting miRNA:mRNA association net- 
work had 97 nodes and 155 connections between the 



31 down-regulated miRNAs and the 71 up-regulated 
mRNAs (Figure 2A). We developed a network to 
demonstrate the overlapping miRNA targets for miR- 
NAs in the miR-30 family (a-e) (Figure 2B) since these 
miRNAs have a large number of mRNA targets. The 
number of targets for miR-30 a-e was 22, 20, 15, 25 and 
17, respectively (Table 3). This network (Figure 2B) 
indicated that two mRNAs (VKORCl and CSTB; in 
red) were predicted to be targets of five miR-30 miR- 
NAs, six mRNAs (ANXA2, GPATCH3, OAZl, PSMB4, 
SLC7A11, ZNF37A; in turquoise) were targeted by four 
of the miRNAs, and ten mRNAs (CAPG, C7orf28A, 
DAP, GPXl, ITPA, MMPll, NEBL, POLR2H, SlOOAll 
and GTF2IRD1; 3 in green and 7 in blue) were targeted 
by three of the miRNAs (Figure 2B). Networks were 
also developed for the seven miRNAs (let-7 g, miR-101, 
miR-133a, miR-15a, miR-16, miR-29b and miR-29c) clo- 
sely related to cancer and their associated mRNAs (Fig- 
ure 2C). Most of the targets mentioned here have been 
reported to be linked to cancer. For example, ANXA2 
up-regulation was reported to be associated with human 
hepatocellular carcinoma, pancreatic adenocarcinoma, 
high-grade glioma, gastric carcinoma, acute promyelocy- 
tic leukemia and primary colorectal cancer [43]. Also, 



Table 3 Gene ontology (GO) terms for miRNA-targets in the category of biological processes 



Cancer-related and down- 
regulated miRNA 


miRNA 
family 


No. of 
mRNAs 


GO terms 


hsa-let-7 g 


Let-7 g 


16 


Anapliase-promoting complex(APC)-dependent proteasomal ubiquitin-dependent protein 
catabolic process, response to oxidative stress and regulation of apoptosis 


hsa-miR-101 


miR-101 


7 


Ubiquitin-dependent protein catabolic process, APC-dependent proteasonmal ubiquitin- 
dependent protein catabolic process and oxidoreductase activity 


hsa-miR-133a 


miR-133 


17 


Polyamine biosynthetic process,glutathione peroxidase process, Ubiquitin-dependent protein 
catabolic process and regulation of apoptosis 


hsa-miR-150 


miR-150 


20 


Polyamine biosynthetic process, APC-dependent proteasomal ubiquitin-dependent protein 
catabolic process. Glutathione peroxidase activity 


hsa-miR-15a 


miR-15 


8 


Anaphase-promoting complex-dependent proteasomal ubiquitin-dependent protein catabolic 
process, response to oxidative stress and proteasome complex 


hsa-miR-16 


miR-16 


8 


APC-dependent proteasomal ubiquitin-dependent protein catabolic process, negative 
regulation of cell proliferation and oxidoreductase activity 


hsa-miR-29b 


miR-29 


15 


Regulation of cell proliferation, glutathione peroxidase activity and regulation of apoptosis 


hsa-miR-29c 


miR-29 


7 


Ubiquitin-dependent protein catabolic process, APC-dependent proteasomal ubiquitin- 
dependent protein catabolic process and translational elongation 


hsa-miR-30a 


miR-30 


22 


Polyamine biosynthetic process, ubiquitin-dependent protein catabolic process, and regulation 

of apoptosis 


hsa-miR-30b 


miR-30 


20 


Ubiquitin-dependent protein catabolic process, polyamine biosynthetic process and oxidative 

stress 


hsa-miR-30c 


miR-30 


15 


Polyamine biosynthetic process, ubiquitin-dependent protein catabolic process, and regulation 

of apoptosis 


hsa-miR-30d 


miR-30 


25 


Regulation of apoptosis, glutathione peroxidase activity and ubiquitin-dependent protein 

catabolic process 


hsa-miR-30e 


miR-30 


17 


Polyamine biosynthetic process, response to oxidative stress and regulation of apoptosis 


hsa-miR-34b 


miR-34 


12 


Regulation of cell proliferation, oxidoreductase activity and translational elongation 


mmu-miR-342 


miR-342 


21 


Polyamine biosynthetic process, translational elongation and APC-dependent proteasomal 
ubiquitin-dependent protein catabolic process 
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Figure 2 miRNA-mRNA association networl^. A. miRNA-mRNA interaction networl< witli 97 nodes and 155 connections between 31 down- 
regulated miRNAs and 71 up-regulated mRNAs. miRNAs are in yellow while mRNAs are in pink. B. A sub-network depicting miRNA-mRNA 
interactions predicted from the miR-30 family. miRNAs are in yellow and mRNAs are in other colors. C. A sub-network depicting miRNA:mRNA 

interactions predicted from other cancer-associated miRNAs: let-7 g, miR-101, miR-133a, miR-15a, miR-16, miR-29b and miR-29c. 
k J 



CapG message and protein are reported to be up-regu- 
lated in colorectal cancer [44]. 

STEP 6. Pathway analysis with the aid of IPA 

To gain additional insight into prospective miRNA 
linked biological networks and canonical pathways, the 
31 down-regulated miRNAs from the colon cancer 
microarrays were analyzed with the aid of Ingenuity 
Pathway Analysis (IPA) software. We focused on two 
miRNA- associated networks related to cancer. The most 
significant (top) functional pathways highlighted by IPA 



consisted of 1) cancer, reproductive system disease, and 
genetic disorders and 2) cell death, hematological system 
development and function, and cancer. We found that 
ten of the down-regulated miRNAs (miRlOl, miR26a, 
miR26b, miRSOa, miRSOb, miRSOd, miRSOe, miR34b, 
miR-let7 g and miRN140) were grouped together in a 
functional network (Figure 3A) and nine of the down- 
regulated miRNAs (miR-130a, miR-133a, miR-142, miR- 
150, miRlSa, miR-16, miR-29b, miR-30c and miR-99a) 
were grouped together in a second network (Figure 3B). 
These are colored in green in Figures 3A and 3B. 
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Figure 3 Top miRNA networks. A. miRNA-associated network with most statistically significant (top) functions related to cancer, reproductive 
system disease, and genetic disorders. The ten miRNAs, colored green were presented in the list of 31 down-regulated miRNAs (Table 1). The 
genes with white color are not from our data set and recruited by the IPA knowledge base. B. miRNA-associated network with top functions 
related to cell death, hematological system development and function, and cancer. The nine miRNAs, colored green were also present in the list 
of 31 down-regulated miRNAs. 



In parallel, we used IPA to investigate the 71 up-regu- 
lated mRNAs. We were particularly interested in finding 
biological networks, functions and canonical pathways 
associated with these mRNAs that may be linked to 
human cancers. Therefore, we focused on the four most 
statistically significant networks with functions related 
to tumor morphology, cell signaling, immunological dis- 
ease, cellular growth and proUferation (Table 4), func- 
tions of cancer (Table 5) and cancer-associated 
canonical pathways (Table 6). The second column of 
Table 4, 5 and 6 includes all genes associated with the 



top four networks, function, or canonical pathways of 
cancer which are also predicted by the PLS regression 
method as possible miRNA targets. 

STEP 7. Combinatorial analysis with the aid of the PLS 
method and IPA pathway designer 

For the above pathway analysis results from 31 down- 
regulated miRNAs and 71 up-regulated mRNAs in the 
previous step, we were interested in knowing which 
miRNAs were associated with the mRNAs grouped in 
the above cancer-related networks, functions and 



Table 4 15 miRNAs and target genes involved in the cancer-related gene networks 



Top 
Network 


Target genes 


Top functions 


The nos. of 

miRNAs 
associated 


Network 
1 


BHLHE40, CANX, CDC42SE1, GPXl, GPX4, GSTPl, MDK, MIF, MMPl 1, OAZl, 
ODCl, PPIB, PRDX2, PSMA3, PSMB2, PSMB4, PSMC2, PSMC4, PSMD8, TGFBI, 

UBE2I 


small molecule biochemistry, drug 
metabolism and tumor morphology 


15 


Network 
2 


ANXA2, CAPG, CLICl, CPNEl, CSTB, DAP, EIF3B, GUKl, IFITMl, RPL34, RPS9, 
SLC7A1 1, S0X4, TBCD, TUFM, USP22 


protein synthesis, cell signaling and 
interaction, and reproductive system 
function. 


14 


Network 
3 


ATP6V0B, BHLHE40, COTLl, CYBA, DDIT4, FTL, ITPA, LAPTM5, NEBL, NUDTl, 
RPS2, SlOOAll, SORD 


genetic disorder, immunological disease 
and free radical scavenging 


14 


Network 
4 


ANXA2, ARPCIB, BUD31, GNPTG, GPATCH3, GTF3C1, NAP1L4, PHB, P0LR2H, 
PUF60, UBA6, VKORCl, WDR36 


gene expression, cellular development, 
cellular growth and proliferation 


14 
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Table 5 15 rniRNAs and target genes involved in the cancer-related functions 



Cancer, related 
Function 


PLS predicted Target genes in the respective functions 


The nos. of cancer related 
miRNAs associated with the 
mRNA targets 


Cancer 


ANXA2, BHLHE40, CANX, CAPG, COTLl, CYBA, DDIT4, FTL, GPXl, GSTPl, IFITMl, MIF, MMPl 1, 
OAZl, ODCl, S100A11, SLC12A8, SLC7A11, S0X4 and UBE2I (20) 


15 


Tumorigenesis 


ANXA2, BHLHE40, CANX, CAPG, COTLl, CYBA, DDIT4, FTL, GPXl, GSTPl, IFITMl, MIF, MMPl 1, 
NUDTl, OAZl, ODCl, PRDX2, SlOOAll, SLC12A8, SLC7A11, S0X4 and UBE2I (22) 


15 


Carcinoma 


ANXA2, BHLHE40, CANX, COTLl, CYBA, FTL, GSTPl, IFITMl, MMPll, OAZl, ODCl, SlOOAll 

and SLC7A11 (13) 


12 


Malignant tumor 


ANXA2, BHLHE40, CANX, COTLl, CYBA, FTL, GPXl, GSTPl, IFITMl, MIF, MMPl 1, OAZl, ODCl, 
SlOOAll, SLC7A11 and S0X4 (16) 


12 


Primary tumor 


ANXA2, BHLHE40, CANX, COTLl, CYBA, FTL, GPXl, GSTPl, IFITMl, MIF, MMPl 1, OAZl, ODCl, 
PRDX2, SlOOAll, SLC7A11 and S0X4 (17) 


12 


Angiogenesis 


ANXA2, CANX and GPXl (3) 


8 



pathways predicted by our PLS regression method. With 
the aid of IPA pathway designer we expanded the above 
networks, functions and pathways generated by IPA 
(Figure 3, Tables 4, 5 and 6) by combining the PLS 
regression method and IPA pathway. 

At first, we examined the two networks that were 
mediated by miRNAs and that were generated from the 
31 down-regulated miRNAs (Figure 3A and 3B). With 
the aid of the IPA pathway designer, we connected 
miRNA-targets predicted by the PLS regression method 
with each associated miRNA labeled as my list (ML in 
Figure 4A and 4B). These miRNAs in ML were con- 
nected to the two networks (Figure 3A and 3B) through 
their predicted targets by IPA. The new expanded net- 
works from Figure 3A and 3B are shown in Figure 4A 
and 4B. In Figure 4A, we found that 27 of 31 down- 
regulated miRNAs were associated with the miRNA- 
associated network through 15 genes (ANXA2, 
ARPCIB, BHLHE40, GPXl, MIF, OAZl, ODCl, PHB, 
PSMB4, RAN, RPLIO, RPL34, RPS9, TUFM, UBE2I) 



predicted by the PLS regression method. These mRNAs 
are all included in the list of predicted target in Table 2. 
It was interesting to see that the four remaining miR- 
NAs (miR-100, rno-miR-140, miRlSa and miR-26a) 
were grouped in this network by IPA and not connected 
through the above cancer-related target mRNAs by the 
pathway designer and hence two of them (miR-100 and 
miR-15a) were not included in Figure 4A. In this net- 
work, many of the cancer-affected target mRNAs were 
mainly regulated by the transcription factors MYCN and 
FOXOl which play a vital role in cell death, apoptosis, 
proliferation and survival. Particularly MYCN was 
involved in the ERK/MAPK signaling pathway and 
cancer. 

In a similar manner in Figure 4B, we found that 12 of 
31 -miRNAs were grouped with the miRNA-associated 
network through their predicted targets (BHLHE40, 
GPXl, MDK, S100A9 and UBE2I). These cancer-asso- 
ciated target mRNAs were connected to this network 
mainly through the interaction with apoptosis and cell 



Table 6 15 miRNAs and target genes in the cancer-related canonical pathways 



Cancer related canonical pathway 


PLS predicted Target genes in the respective 
pathways 


The nos. of cancer related miRNAs associated with 
the mRNA targets 


Protein Ubiquitination Pathway 


PSMA3, PSMB2, PSMB4, PSMC2, PSMC4, PSMD8, 
UBE2I, USP22(8) 


14 


Polyamine Regulation in Colon 
Cancer 


OAZl, ODCl, PSMA3, PSMB2, PSMB4, PSMC2, 
PSMC4, PSMD8(8) 


15 


Purine Metabolism 


ATP6V0B, GUKl, ITPA, P0LR2H, PSMC2, PSMC4 (6) 


11 


Circadian Rhythm Signaling 


BHLHE40(1) 


2 


Hypoxia Signaling in the 
Cardiovascular System 


UBE2I(1) 


4 


Cdc42 Signaling 


ARPCIB, CDC42SE1(2) 


4 


Pyrimidine Metabolism 


ITPA, P0LR2H (2) 


7 


mTOR Signaling 


DDIT4, EIF3B (2) 


7 


NRF2-mediated Oxidative Stress 
Response 


FTL, GSTPl, PPIB(3) 


7 


Xenobiotic Metabolism Signaling 


FTL, GSTPl (2) 


7 
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Figure 4 Overlaid networl^s predicted by the PLS method and generated by IRA pathway designer. A. Depicted is an overlaid miRNA- 
mediated networl< consisting of tine top networl< 1 from Figure 3A on tine left side of this network (green-colored miRNAs) and 27 miRNAs 
labeled with ML on the right that were significantly associated with the pink-colored mRNAs that were predicted by PLS. B. Depicted is an 
overlaid miRNA-mediated network consisting of the top network 2 from the Figure 3B on the left side of this network (green-colored miRNAs) 
and 12 miRNAs labeled with ML on the right that were significantly associated with the pink-colored mRNAsthat were predicted by PLS. 



death genes such as BCL2, CASP9 and ELKl (Figure 
4B). 

Next we examined the expanded IP A results from 71 
up-regulated mRNAs which were shown in the last col- 
umn of Tables 4, 5 and 6. At first we wanted to know 
which of the 31 down-regulated miRNAs were asso- 
ciated with the top four networks through their target 
mRNAs predicted by PLS regression method. With the 
aid of IP A pathway designer, we found that 27 of the 31 
down-regulated miRNAs were linked to one or more 
mRNA networks and 20 of them (let-7 g, miR-101, 
miR-126, miR-133a, miR-142-5p, miR-150, miR-15a, 
miR-26b, miR-28, miR-29b, miR-30a, miR-30b, miR-30c, 
miR-30d, miR-30e, miR-34b, miR-99a, mmu-miR-151, 
mmu-miR-342 and rno-miR-151) were involved in all of 
the top 4 networks. By restricting our attention to only 
the 15 cancer-associated miRNAs in the top four 
mRNA networks, we found that all 15 miRNAs were 
involved in network 1, all but miR-16 were in network 
2, and all but miR-29c were in network 3 and 4, as 
shown in the fourth column of Table 4. 

We also checked which miRNAs associated with the 
mRNA targets predicted by PLS regression method were 
linked to the cancer-related function. Through their 



targets, we found that 26 out of the 31 down-regulated 
miRNAs were associated with these targets and were in 
some ways related to cancer, 24 miRNAs were in the 
pathway of polyamine regulation in colon cancer and 12 
miRNAs were involved in angiogenesis pathway. Again, 
we were interested in examining the 15 miRNAs directly 
linked with cancer-and their biological functions. We 
found that all 15 miRNAs were involved in cancer and 
tumorigenesis, 12 of them (all except miR-101, miR-15a 
and miR-29c) were in carcinoma, malignant tumor and 
primary tumor and 8 of them (all except let-7 g, miR- 
101, miR-150, miR-15a, miR-16, miR-29c and miR-342) 
were in angiogenesis as were shown in the last column 
of Table 5. 

Furthermore, we examined which associated miRNAs 
among the 15 cancer-related miRNAs were involved in 
the canonical pathways associated with cancer. We 
found that 14 miRNAs were associated with protein ubi- 
quitination pathway, all 15 miRNAs were in polyamine 
regulation in colon cancer, 11 miRNAs were associated 
with purine metabolism, 2 miRNAs were associated 
with circadian rhythm signaling, 4 miRNAs each were 
associated with hypoxia signaling in cardiovascular sys- 
tem and cdc42 signaling, and 7 miRNAs each were 
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associated with pyrimidine metabolism, mTOR signaling 
or NRF2-mediated oxidative stress response or xenobio- 
tic metabolism signaling along and their targets are 
mentioned in the last column of Table 6. 

Overall, it clearly showed that these cancer-related 
miRNAs may mediate many biological functions and 
pathways by regulating the target genes predicted by our 
integrative and computational method. 

STEP 8. Estimating false detection rate for PLS based 
target prediction 

In order to determine the false detection rate of our 
mRNA target prediction method using PLS regression we 
used a simulation experiment. As defined in [22], the false 
detection rate is the percentage of detected significant 
miRNA-mRNA pairs derived as output from the PLS 
method out of the total possible pairs that would have the 
same or better associations by chance. Based on this pro- 
cedure, the estimated false detection rate for the threshold 
FDR of 0.3 was 2.50%. We then reported the estimated 
false detection rates for many values of the thresholds 
simultaneously, and the estimated false detection rate is 
plotted as a function of the threshold in Figure 5. 

We also compared the PLS regression method with a 
simpler method based only on the correlations for all 
miRNA-mRNA pairs. A correlation threshold of -0.750 



corresponded to an estimated false detection rate of 
2.50%. The PLS based method was more sensitive in 
detecting targets when applied to the actual data set; 
only 33 miRNA-mRNA connections were declared to be 
significant based on the correlation analysis with a 
threshold -0.750, whereas 155 connections were signifi- 
cant for the FDR threshold of 0.3. 

Discussion 

Integrative computational modeling of miRNA targets 
using PLS 

Although there are many computational algorithms for 
predicting miRNA targets, they vary widely with regards 
to specific targets identified. The relative usefulness of 
these algorithms is strengthened if there are some agree- 
ments in their conclusions. However, that is usually not 
the case among the existing algorithms. Additionally, 
the existing algorithms depend on the sequence comple- 
mentarity of the miRNA-mRNA pairs. So, more com- 
plex algorithms using joint expressions of miRNA and 
mRNA may be useful to predict miRNA targets. In this 
study we developed an approach for predicting miRNA- 
mRNA associations based upon a statistical analysis 
using unpaired t-tests, PLS regression based association 
scores and the known inverse relationship between cer- 
tain mRNAs and miRNAs. 




Threshold FDRforq-vafifes 

Figure 5 Estimated false detection rates for different thresholds. Various thresholds for the q-values are on the x-axis. The estimated false 
detection rate corresponding to each threshold is given on the y-axis. For the selected threshold of 0.3, the estimated false detection rate is 
2.50%. 
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Our efforts focused on a set of 31 significantly down- 
regulated miRNAs and a set of 71 significantly up-regu- 
lated mRNAs identified by the t-test for colon cancer vs. 
the control samples after multiple hypotheses correc- 
tions. We selected these miRNAs and mRNAs based on 
their inverse relationship for our integrative analysis. 
Using miRNAs as independent variables and mRNAs as 
dependent variables in our PLS regression method we 
identified significant miRNA-mRNA associations and in 
turn predicted mRNA targets for individual miRNAs 
(Table 1). When comparing these to miRNA-targets 
predicted by TargetScan 5.1 and miRanda, we found on 
average about 16.5% and 11.0% of our predicted targets 
matched those predicted by these two programs, respec- 
tively. Furthermore, some of our predicted target genes 
lie within the union or intersection of the other two tar- 
get prediction programs. It seems likely that the miRNA 
target genes that lie within such intersections could be 
more accurate, although this claim remains to be evalu- 
ated further, experimentally. Interestingly, many of the 
targets predicted by our methods are not predicted by 
the other prediction algorithms and thus our approach 
provides possible novel targets that need further investi- 
gation for relevance to colorectal cancer {Step 7}. We 
estimated the false detection rate of the proposed 
method through a simulation study by sampling from 
the real data under the null model. A limitation of the 
present study is that we are unable to compute statisti- 
cal performance measures such as the false discovery 
rate since there does not exist any realistic simulator 
currently for generating miRNA and mRNA datasets 
simultaneously with controlled biological interactions 
between them. 

We have also compared our computational method of 
target prediction with another computational method of 
target prediction by association resulting from pairwise 
simple correlations between all 31 the up-regulated 
miRNAs and 71 down-regulated mRNAs. For the same 
dataset using the same threshold we get many more tar- 
gets and so the PLS based method proved to be more 
sensitive than just correlation based target prediction 
described in Step 8 in the Results section. 

Conclusion 

In summary, we have developed a bioinformatics com- 
putational approach for the analysis of miRNA and 
mRNA expression profiles derived from the same set of 
cancerous tissue samples to infer the molecular mechan- 
ism for the colon cancer etiology. This analysis provides 
an alternative computational approach to identify puta- 
tive miRNA targets and reverse engineer the miRNA 
networks with their mRNA targets. We have developed 
a PLS modeling approach to take advantage of the 
known inverse relationships between some miRNAs and 



their target mRNAs. This approach would certainly have 
to be modified for those situations where such a rela- 
tionship does not exist. Our analysis resulted in 3 major 
outcomes. 1) First, several of the predicted miRNA tar- 
gets identified with our approach have also been pre- 
dicted by other target prediction algorithms, which 
provide some support for the development of our meth- 
ods. 2) The PLS regression approach adopted here has 
resulted in the recognition of novel miRNA-mRNA sets 
(in turn miRNA targets). These putative data-sets could 
provide better understanding of the underlined molecu- 
lar mechanism of colon cancer or any other complex 
disease and identify novel therapeutic targets upon 
experimental validations. 3) An in silico analysis of the 
biological significance of the results obtained with our 
PLS approach and pathway analysis indicates that pro- 
cesses related to protein degradation and cell death per- 
haps initiated by oxidative stress are likely involved in 
colorectal cancer. This fact has been concluded by other 
experimental studies [45]. Overall, it clearly shows that 
these down-regulated miRNAs for colon cancer can 
most likely be involved in many altered biological func- 
tions along with their targets. The utility of our 
approach and its possible applicability to other disease 
states awaits further experimental validation. 

Methods 

miRNA and mRNA microarray data and gene expression 
profile analysis 

The miRNA and mRNA microarray data sets for both 
human colon tumors and normal tissues used here 
were obtained from the Broad Institute and down- 
loaded from their web-accessible database: http://www. 
broadinstitute.org/cgi-bin/cancer/datasets.cgi [20]. The 
data sets originally consisted of five normal human tis- 
sues and 10 human colon tumors for both miRNA and 
mRNA. However, only four of the normal tissue sam- 
ples and seven of the colon tumor samples passed 
quality control criteria [46]. The mRNA expression 
values were obtained using Affymetrix GENECHIP 
analysis software (after hybridization to Affymetrix oli- 
gonucleotide microarrays Hu6800 and Hu35KsubA) 
containing a total of 16,063 probes [46]. The miRNA 
median fluorescence intensity values were measured by 
using a new, bead-based flow cytometric miRNA 
expression profiling method. All data were filtered by 
minimum value 32 (values less than 32 were consid- 
ered as a background value according to the reference) 
and log2 transformed [20]. 

The miRNA and mRNA expression data sets for colon 
cancer patients were loaded into GeneSpring GX 10.0 
(Agilent Technologies). An unpaired t-test was used to 
compare miRNA and mRNA expressions for normal 
and tumor tissue samples. The p-values obtained were 
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then adjusted with the aid of the Benjamini-Hochberg 
false discovery rate (FDR) [28] correction method. The 
differentially expressed miRNAs or mRNAs with q- 
values less than or equal to 10% and the absolute fold 
changes greater than or equal to 1.2 were determined to 
be significant. 

miRNA target prediction using partial least square (PLS) 
method and bootstrapping 

An algorithmic version of the whole procedure is pro- 
vided in Figure 1. A detailed description of the PLS 
regression method used in this paper was previously 
reported in [23] and [27]. Briefly, consider a data set 
with n observations of p miRNAs (predictor variables) 
and m mRNAs (response variables). Let Y = {Yii,Y2v — > 
y^/)' be the column vector of observed values for the /th 
mRNA;/ = 1,2, ... ,m, and let Xj = {Xy, X2J, X^j)' be 
the column vector of observed values for the ;th miRNA 
for ; = 1,2, . For each mRNA Y^, PLS fits a regression 
model of the form 

ei 

where is an w-dimensional vector of residuals and 
is the k-th latent variable amongst the set of all 
orthogonal latent variables (/<<</?), all of which are lin- 
ear combinations of the covariates. All of the original 
variables are standardized before doing PLS. 

Using ordinary least squares leads to 

and then the coefficients associated with the original 
covariates can be obtained from the coefficients of the 
latent variables. 

Specifically, after q latent factors are found by select- 
ing the number of component as 3, we calculated the 
scaled score coefficient /f^. Then the score coefficients 
Pij and intercept can be obtained by transforming yik 

back to the original scale as follows - ^=1 ^ r ^> 



oii = Yi — PijX. These coefficients are called the PLS 
scores, and these scores are defined as the associations 
between each response variable (mRNA) and the 
miRNA covariate fiij for a specific / and 

Statistical test of the significance of the association 
between miRNA and mRNA ifiij) 

In order to test the significance of the association 
between miRNA and mRNA, we tested the null 



hypothesis is Hq: Pij = 0, (which implies there is no sig- 
nificant association between a specified miRNA and 
mRNA) versus the alternative H^: Pij< 0( which implies 
there is an inverse association between them). As we do 
not know the asymptotic distribution of such associa- 
tions we used a bootstrap re-sampling scheme to con- 
struct the sampling distribution of the score Pij for 
testing the hypothesis. 

We ran PLS after removing the yth miRNA gene, to 
generate new Pij and we compute the residual of the 
model which is the estimate of the error e^. 

After the residuals are computed, bootstrapping resi- 
duals e*i,e*2, . . . ,e^ri generated by simple random 
sampling with replacement of the components en.ea,,^., 
Ciyi of the residual vector e/. Then the bootstrapped 7* 
data can be found by the original regression equation: 

P 

where the parameters hold their usual meanings. We 
then run PLS with the new 7* and original full set of 
miRNAs, and store the bootstrapped PLS based score 
Pij, We did this bootstrapping step 1000 times for each 
gene and calculate the proportion of times when the 
negative bootstrapped association scores were smaller 
than negative observed association score Pij < Pij. These 
proportions are the estimated p-values of the tests. The 
Benjamini & Hochberg FDR [28] then is used to calcu- 
late q-values. A threshold FDR was arbitrarily chosen as 
0.3 since the number of variables in this test was small 
after having identified the inverse relationship between 
miRNA and mRNA. The associations for which the q- 
values were less than equal to the FDR cutoff were 
deemed to be significant. 

Simulation Experiment 

A simulation method similar in concept to the one pro- 
posed in [22] was used to estimate the false detection 
rate of possible targets using PLS regression for a speci- 
fic threshold of FDR. The definition of the false detec- 
tion rate was provided before in Step 8 of the Results 
section, as described on page 4 of [22]. Note that the 
concept of false detection rate refers to the simulation- 
based estimate of the percentage of detected significant 
miRNA-mRNA pairs derived by chance, and when using 
this concept, we refer to it in full as false detection rate 
so as not to confuse it with the abbreviation FDR which 
we have already reserved for the more commonly-used 
false discovery rate [28]. While we still use the FDR to 
control false positives for our PLS method, we use the 
false detection rate for this simulation to determine 
equivalent thresholds for our method and for a simple 
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correlation method which does not choose its threshold 
based on FDR. 

To generate a simulated data set with n "unassociated" 
observations of p miRNAs and m mRNAs, we randomly 
permuted the order of the n observations for both the 
miRNA genes and the mRNA genes so that none of the 
miRNA arrays corresponded to its true mRNA array. 
Specifically, we randomly permuted the labels of both 
arrays and rejected any generated data sets for which 
the new labels matched. Therefore, since the miRNAs 
do not correspond to the mRNAs in the simulated data 
set, any low q-values only occur by chance. For the 
simulated data set, we then compute the number of 
false positives by counting the number of q-values 
which fell below the threshold. The process was 
repeated for 1000 simulated data sets and the estimated 
false detection rate was computed by dividing the total 
number of false positives for the 1000 data sets by the 
overall total number of miRNA-mRNA pairs for the 
1000 data sets. 

We also compared the PLS regression with a simpler 
method based only on the correlations for all miRNA- 
mRNA pairs. In the correlation method, we declare that a 
miRNA-mRNA pair is significant if the corresponding 
pairwise correlation fell below a certain threshold. To 
compare the sensitivity of the methods, we used the same 
simulation method described above, and we chose the 
threshold for which the estimated false detection rate was 
the same as that found for the PLS regression method. 

Biological function and pathway analysis 

To obtain gene ontology (GO) terms for each miRNA, 
we used Pathway Studio software (Ariadne Genomics 
Inc., Rockville, MD, USA). This software enabled an 
enrichment analysis of expressed genes for GO terms 
using the differentially expressed miRNA target genes 
with respect to the category of biological process. The 
GO terms were selected by significant p-value and can- 
cer-associated biological functions. 

Using Ingenuity Pathway Analysis System and pathway 
designer (IPA 8.5, Ingenuity Systems, Edwood, CA, 
USA), we performed more detailed functional analysis 
to identify miRNA-mediated, cancer-related and statisti- 
cally significant networks, biological functions and cano- 
nical signaling pathways for both differentially expressed 
miRNAs and mRNAs. 

Additional material 



Additional file 1: Differentially expressed miRNAs in human colon 
tissues. Supplemental Table 2 (52): This file is a tab-delimited text file 
and contains 2 up-regulated and 31 down-regulated miRNAs with FDR 
corrected p-value less than 0.1 and absolute fold changes greater than 
1.2. It also provides detailed information about each miRNA such as its 
name, expression values and description. 



Additional file 2: Differentially expressed mRNAs in human colon 
tissues. Supplemental Table 1 (SI): This file is a tab-delimited text file 
and contains 73 up-regulated and 63 down-regulated mRNAs with FDR 
corrected p-value less than 0.1 and absolute fold changes greater than 
1.2 It also provides detailed information about each mRNA such as its 
name, expression values and description. 
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